{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Q Learning (Off Policy)\n",
    "\n",
    "We will be using **TD control method of Q Learning** on Cliff World environment as given below:  \n",
    "\n",
    "![GridWorld](./images/cliffworld.png \"Cliff World\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q Learning Update equation\n",
    "\n",
    "Q Learning control is carried by sampling step by step and updating Q values at each step. We use ε-greedy policy to explore and generate samples. However, the policy learnt is a deterministic greedy policy with no exploration. The Update equation is given below:\n",
    "\n",
    "$$ \n",
    "\\DeclareMathOperator*{\\max}{max} Q(S,A) \\leftarrow Q(S,A) + \\alpha * [ R + \\gamma * \\max_{A'} Q(S’,A’) – Q(S,A)] $$\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial imports and enviroment setup\n",
    "import sys\n",
    "import gym\n",
    "import gym.envs.toy_text\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Q- Learning agent class\n",
    "from collections import defaultdict\n",
    "\n",
    "\n",
    "class QLearningAgent:\n",
    "    def __init__(self, alpha, epsilon, gamma, get_possible_actions):\n",
    "        self.get_possible_actions = get_possible_actions\n",
    "        self.alpha = alpha\n",
    "        self.epsilon = epsilon\n",
    "        self.gamma = gamma\n",
    "        self._Q = defaultdict(lambda: defaultdict(lambda: 0))\n",
    "\n",
    "    def get_Q(self, state, action):\n",
    "        return self._Q[state][action]\n",
    "\n",
    "    def set_Q(self, state, action, value):\n",
    "        self._Q[state][action] = value\n",
    "\n",
    "    # Q learning update step\n",
    "    def update(self, state, action, reward, next_state, done):\n",
    "        if not done:\n",
    "            best_next_action = self.max_action(next_state)\n",
    "            td_error = reward + \\\n",
    "                       self.gamma * self.get_Q(next_state, best_next_action) \\\n",
    "                       - self.get_Q(state, action)\n",
    "        else:\n",
    "            td_error = reward - self.get_Q(state, action)\n",
    "\n",
    "        new_value = self.get_Q(state, action) + self.alpha * td_error\n",
    "        self.set_Q(state, action, new_value)\n",
    "\n",
    "    # get best A for Q(S,A) which maximizes the Q(S,a) for actions in state S\n",
    "    def max_action(self, state):\n",
    "        actions = self.get_possible_actions(state)\n",
    "        best_action = []\n",
    "        best_q_value = float(\"-inf\")\n",
    "\n",
    "        for action in actions:\n",
    "            q_s_a = self.get_Q(state, action)\n",
    "            if q_s_a > best_q_value:\n",
    "                best_action = [action]\n",
    "                best_q_value = q_s_a\n",
    "            elif q_s_a == best_q_value:\n",
    "                best_action.append(action)\n",
    "        return np.random.choice(np.array(best_action))\n",
    "\n",
    "    # choose action as per epsilon-greedy policy for exploration\n",
    "    def get_action(self, state):\n",
    "        actions = self.get_possible_actions(state)\n",
    "\n",
    "        if len(actions) == 0:\n",
    "            return None\n",
    "\n",
    "        if np.random.random() < self.epsilon:\n",
    "            a = np.random.choice(actions)\n",
    "            return a\n",
    "        else:\n",
    "            a = self.max_action(state)\n",
    "            return a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot rewards\n",
    "def plot_rewards(env_name, rewards, label):\n",
    "    plt.title(\"env={}, Mean reward = {:.1f}\".format(env_name,\n",
    "                                                    np.mean(rewards[-20:])))\n",
    "    plt.plot(rewards, label=label)\n",
    "    plt.grid()\n",
    "    plt.legend()\n",
    "    plt.ylim(-300, 0)\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# training algorithm\n",
    "def train_agent(env, agent, episode_cnt=10000, tmax=10000, anneal_eps=True):\n",
    "    episode_rewards = []\n",
    "    for i in range(episode_cnt):\n",
    "        G = 0\n",
    "        state = env.reset()\n",
    "        for t in range(tmax):\n",
    "            action = agent.get_action(state)\n",
    "            next_state, reward, done, _ = env.step(action)\n",
    "            agent.update(state, action, reward, next_state, done)\n",
    "            G += reward\n",
    "            if done:\n",
    "                episode_rewards.append(G)\n",
    "                # to reduce the exploration probability epsilon over the\n",
    "                # training period.\n",
    "                if anneal_eps:\n",
    "                    agent.epsilon = agent.epsilon * 0.99\n",
    "                break\n",
    "            state = next_state\n",
    "    return np.array(episode_rewards)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# helper fucntion to print policy for Cliff world\n",
    "def print_policy(env, agent):\n",
    "    nR, nC = env._cliff.shape\n",
    "\n",
    "    actions = '^>v<'\n",
    "\n",
    "    for y in range(nR):\n",
    "        for x in range(nC):\n",
    "            if env._cliff[y, x]:\n",
    "                print(\" C \", end='')\n",
    "            elif (y * nC + x) == env.start_state_index:\n",
    "                print(\" X \", end='')\n",
    "            elif (y * nC + x) == nR * nC - 1:\n",
    "                print(\" T \", end='')\n",
    "            else:\n",
    "                print(\" %s \" %\n",
    "                      actions[agent.max_action(y * nC + x)], end='')\n",
    "        print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "    This is a simple implementation of the Gridworld Cliff\n",
      "    reinforcement learning task.\n",
      "\n",
      "    Adapted from Example 6.6 (page 106) from Reinforcement Learning: An Introduction\n",
      "    by Sutton and Barto:\n",
      "    http://incompleteideas.net/book/bookdraft2018jan1.pdf\n",
      "\n",
      "    With inspiration from:\n",
      "    https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py\n",
      "\n",
      "    The board is a 4x12 matrix, with (using Numpy matrix indexing):\n",
      "        [3, 0] as the start at bottom-left\n",
      "        [3, 11] as the goal at bottom-right\n",
      "        [3, 1..10] as the cliff at bottom-center\n",
      "\n",
      "    Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward\n",
      "    and a reset to the start. An episode terminates when the agent reaches the goal.\n",
      "    \n"
     ]
    }
   ],
   "source": [
    "# create cliff world environment\n",
    "env = gym.envs.toy_text.CliffWalkingEnv()\n",
    "print(env.__doc__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a Q Learning agent\n",
    "agent = QLearningAgent(alpha=0.25, epsilon=0.2, gamma=0.99, \n",
    "                       get_possible_actions=lambda s : range(env.nA))\n",
    "\n",
    "#train agent and get rewards for episodes\n",
    "rewards = train_agent(env, agent, episode_cnt = 5000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEICAYAAAC3Y/QeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAiQ0lEQVR4nO3de5xU5Z3n8c+XlpsCXtCA2igwwRhwMYZWVBJtJyYSY5Q4mpCZjBBmh3hhZ3YnrzU4TDIxCZubM5l1fakhbsKYmDEqOhIVo8YUKGoUDaJIVFDUDgiKq0Dk3r/945yORVN9qa6qru463/frdV5UPc85z3l+RfWvznnOTRGBmZllS59qd8DMzLqfk7+ZWQY5+ZuZZZCTv5lZBjn5m5llkJO/mVkGOflbuyRNl/Rw3vutkkanrwdK+qWkdyTdmpZ9S9Kbkl6vQl/XSjqzjbpGSU3d3adaJ2mkpJC0X7X7YsVx8jcknSVpiaQtkt6QtFjSuYXmjYhBEfFS+vYCYBgwNCIulDQC+DIwNiKGF1jP85I+m/d+Upo4WpdtrWYySX8oQtLtrcqPT8tzVepaZkk6Q9Jv0g2NtQXqf5N+dzdLelrSee20JUnflbQpnb4nSRUNoAdy8s84SRcAtwI3AvUkyfxrwKc7sfjRwAsRsTvv/aaI2NjG/EuA0/Penwb8vkDZI3ltdiaGSvxQvAGcKmloXtk04IUKrKtokuqqsM5qbt3/Efgx8D/bqP974PCIGALMBH4m6fA25p0JTAGOB8YD5wBfKmtvewEn/yqTdISkBelWy8uS/i6v7uuSbpF0Y7pVvlJSQ1o3W9Jtrdr635KuLmLdAv4V+GZE3BAR70REc0Qsjoi/bWOZkPR+SVeS/Eh8Lt1S/xJwP3BE+n5+gcWXkCT3Fh8FvlugbEm6rnPTmN+WlJP0wbx+rJX0FUkrgD+2TkzpkNR8Sf9P0nPAiZ39XFI7gf8Epqbt1QGfBW5qtZ5jJd0v6a0CezafkvS7dGv0NUlfz6trGS6ZJunVdKhsTludSWO5TtI9kv4InNHWd0fSAEnbJB2avv8nSbslDUnff0vSvxXRx7+R9CrwoKQ6SVel/X0J+FSRn2uXRMTjEfFT4KU26lfkbTAE0BcY0UZz04B/iYimiPgD8C/A9DJ3ueeLCE9Vmkh+fJ8kSaL9gNEkX+6z0vqvA9uBs4E64NvAY2nd0cC7wJD0fR2wHjg5fX8t8HYb04p0nmNJ/lBGtdPH6cDDee8DeH9e/36WV9cINLXT1lFAM3BIGvtGYCDwWl7Z2yQ/BseQbO19nOQP+XJgNdAvbWstsJzkD3xgXtmZ6evvAA+l7Y4Anm2vb6362Qg0AacCv03LzgZ+BfxXIJeWHZD2/YvAfsCHgTeBcXnt/Jc0rvHABmBKWjcy/Sx/lH4GxwM7gA+20af5wDvApLS9/Wn/u7ME+Iv09X3AGuCTeXWfKaKPN6axDgQuJtlbG5F+tr9J59mvjX7fRdvfw7u68DdzJrC2nXVtT/tzL9CnjfneASbmvW8AtlQ7H3T35C3/6joROCwivhEROyMZS/8R6dZm6uGIuCci9gA/JUkSRMQrwFMku68Afw68GxGPpfWXRsRBbUzj02VahjTWVzTKVES8CrxKsnV/PPBiRGwDluaVDQB+C3wOuDsi7o+IXcBVJMnn1Lwmr46I19I2WvssMDci3oqI14BO7xHl9fcR4BBJHwAuIkmC+c4hSUQ/iYjdEfEUsIDkWAgRkYuIZyLZm1oB/Ad7D3EBXBkR2yLiaeDp9DNoy50RsTQimkkSdnvfncXA6eke0fg0/tMlDSD53j1URB+/HhF/TD/nzwL/ln7ub5FskLT3GZ7TzvfwnPaWLVba3mDSH+r0cypkEMkPQIt3gEFZG/d38q+uo0mGSd5umYB/JBl3b5F/1sy7wIC8IY6fA59PX/9l+r4Ym9J/2xobrYSWoZ/TSBMQ8HBe2W8jYgdwBPBKy0LpH/JrwJF5bb3WznqOaFX/SlszduCnwCzgDOCOVnVHAxNb/f/9FTAcQNLEvAOR75BsNR/aqo3W/7+D2ulLfjwdfXcWk2zVfxh4hmRI7nTgZGB1RLxZRB/z11uuz7VNkv4xHTrcKun6YpaNiF0RsQg4S22ctABsBYbkvR8CbI10NyArnPyr6zXg5VZbQ4Mj4uxOLn8r0CipHvgMeclf0vV5f0Ctp5XpbM+nffiLcgbVgZbk/1HeS/4P5ZUtScvWkSQ44E/HJ0YAf8hrq70/1vXsPeZ7VBf7+1PgUuCeiHi3Vd1rwOJW/3+DIuKStP7nwEJgREQcCFwPlLJ1mR9vR9+dR4APkHwvFkfEcySfwadIfhhadKaP+est6nOVtKid7+GigkFG/K/0cxwUERe313479gP+rI26ley9h3V8WpYpTv7V9TiwOT1wOTA9mHacpE4dnIyIN4Ac8BOSRLAqr+7ivD+g1tO4dJ4A/gH4qqQvShoiqY+kj0iaV/ZoE0uAE0i2QpemZc8Ao0i2rluS/y3ApyR9TFJfklNId5Aktc64BbhC0sHpj+N/y69MD6DO76iRiHg57Wuhg7F3AcdI+mtJfdPpxLwD04OBtyJiu6STSPbOyqXd7076Q/UkcBnvJftHSM5qyU/+xfbxFuDvJNVLOhiY3d7MEfHJdr6Hn+xssOn3cgDJ8R+lB7X7pXXHSvpk+jn0lfQFko2JxW00dyPwD5KOlHQEyXdrfmf7Uiuc/KsoHcf/NPAh4GWSg4U3AAcW0czPSQ6CFTvk09KH20jG12eQbG1vAL4F3NmV9jqxvhdIDvSuj4i307JmkmQ2hDS5R8TzwBeA/0PyuXwa+HRE7Ozkqq4kGZJ4meSA509b1Y/gvR+fjvr8cESsK1C+BfgEyTj7OpIhnO8C/dNZLgW+IWkLyYHZWzrZ9870qTPfncUkyfLxvPeDee8Htit9/BHJge+nSY453d7+7GVzGrANuIdkb2Mbyf8rJHsqXyf5Xr1Bctrn59JjMEj6qKSteW39EPglyUbHs8DdaVmmKGPDXGakW4xPA+PTg8lmmePkb2aWQVUb9pE0WclFMasltTtuaGZm5VWVLX8lV0u+QHIBTxPwBPD59IwEMzOrsGpt+Z9Ecq7xS+kBvJuBNm/EZGZm5VWtGzUdyd4XijQBE1vPJGkmyU2YGDhw4IQRI9q6VUf7mpub6dMnWyc2OeZsyFrMWYsXSo/5hRdeeDMiDmtdXq3kX+hCl33GnyJiHjAPoKGhIZYtW9alleVyORobG7u0bG/lmLMhazFnLV4oPWZJBa/CrtZPaBN7XyVYT3KetJmZdYNqJf8ngDGSRqXnXE8lucTczMy6QVWGfSJit6RZJFcK1gE/jojM3VvDzKxaqvZknoi4h+RSbTMz62bZOmxuZmaAk7+ZWSY5+ZuZZVDVxvyr5fV3tvPDJWv4ydK1ZW+73377/pbu3N3cbn2lNDc30+eBgs/KqFmOufZlLV5IYn520h4G9K0ra7uZSv7vbNvFyd/+dVnaGnpAPzb9ce9by8+YNGqf+a5fvKbd+kp59dVXOeqorj68qndyzLUva/FCEnNdn/I/XjhTyX/zttJu3f61c8byjbuSe89d+1cfZtX6zXz/V88z8tADOHHkIcz+5LH7LHP7U01s3LKDfzz7WGae1tZT5covl3udxsZ9+1PLHHPty1q8kMTct678owaZSv6nff83Rc0vQf5NT2d8ZBQ/eOAFtmzfzZEHD2Ti6KFM72Br/qD9+7Jxyw5OO2afW2uYmVVNpg74Fnv36qVf+fM265LniXeeSnput5lZeWUm+XfluQVHHDSwQEPFrrfo1ZqZVVxmkv+ty5rK2p63482sN8tM8r98wYryNFRk1i9ydMjMrFtkJvmXjYd9zKwGOPl3UbFb9N4DMLOexMm/SMVuyHvD38x6Iif/LvKpm2bWm2Ui+a98c0/Z2io25fsnwsx6okwk/6atzR3P1Eke9jGzWpCJ5P8fv9/Z8UxFKvqAb9l7YGbWdZlI/uVU7JXCXbmy2Mys0pz8u8hb8mbWm9V88t++q3wHe83MakXNJ/9rHlxdmYZ9kZeZ9WI1n/y3bC/tAS5mZrWo5pN/uQ+3+lRPM6sFNZ/8K8VX+JpZb+bkb2aWQTWf/Ct1mn3xB3C9p2BmPUfFkr+kr0v6g6Tl6XR2Xt0VklZLel7SWZXqg5mZFbZfhdv/QURclV8gaSwwFRgHHAE8IOmYiKjICflR5kOuRe9J+IivmfVA1Rj2OQ+4OSJ2RMTLwGrgpCr0oyQexDGz3qzSyX+WpBWSfizp4LTsSOC1vHma0jIzM+smJQ37SHoAGF6gag5wHfBNkoGPbwL/Asyg8EZzwcERSTOBmQDDhg0jl8sV3cd1f9hR9DItWq8vl8uxpzkZnXrkkUcY3K/j7f93330XgCcef5ymQd23o7V169YufV69mWOufVmLFyoXc0nJPyLO7Mx8kn4E3JW+bQJG5FXXA+vaaH8eMA+goaEhGhsbi+7jA28/A6+9WvRyAI2NjXDv3Xu9r/v1vbBnD5MmTeKQA/p12Mb+y3Lw7h85aeJJ/Nlhg7rUj67I5XJ05fPqzRxz7ctavFC5mCt5ts/heW8/Azybvl4ITJXUX9IoYAzweKX6Ue5TPYs9gOzjvWbWE1XybJ/vSfoQSf5bC3wJICJWSroFeA7YDVxWqTN9KskHfM2sN6tY8o+Iv26nbi4wt1LrNjOz9tX+Fb4VatePcTSz3qzmk7+Zme2r5pN/2Q/4Ftmen+FrZj1RzSf/SvEtnc2sN3PyNzPLICf/rir6Gb7eUzCznsPJ38wsgzKQ/Mt8S+eqrt3MrDwykPwrw6M4ZtabOfmbmWVQzSf/ij3Dt8Lzm5lVUs0n/7Ir+iKvynTDzKwUNZ/8K7blX+Sgv48RmFlPUvPJv6fwHoCZ9SRO/mZmGeTk30VFH/D1sI+Z9SBO/kUq/jGOHu8xs56n5pP/L5a9VpF2i3+Yizf9zaznqPnk31N4D8DMehInfzOzDHLy76Jih3E87GNmPYmTf5GKf4xjZfphZlYKJ/8uKvqArzf8zawHcfLvJt4DMLOexMnfzCyDnPy7iYd9zKwncfIvUtGPcfRwj5n1QE7+XeQteTPrzUpK/pIulLRSUrOkhlZ1V0haLel5SWfllU+Q9Exad7WKvTG+mZmVrNQt/2eB84El+YWSxgJTgXHAZOBaSXVp9XXATGBMOk0usQ9mZlakkpJ/RKyKiOcLVJ0H3BwROyLiZWA1cJKkw4EhEfFoRARwIzCllD5Ui6/YNbPebL8KtXsk8Fje+6a0bFf6unV5QZJmkuwlMGzYMHK5XNk72p7W68vlcjQ3J0dwlyxZzH59Ov4B2L59OwCPPfYYa/bvvkMsW7du7fbPq9occ+3LWrxQuZg7TP6SHgCGF6iaExF3trVYgbJop7ygiJgHzANoaGiIxsbG9jtbyL13F79MqrGxca/lGxsb0a/uJgJOP/10+tZ1nMwHPPYgbN/GKaecTP3B+3e5L8XK5XJ06fPqxRxz7ctavFC5mDtM/hFxZhfabQJG5L2vB9al5fUFymueT/k0s56kUuMQC4GpkvpLGkVyYPfxiFgPbJF0cnqWz0VAW3sPZmZWIaWe6vkZSU3AKcDdkn4FEBErgVuA54B7gcsiYk+62CXADSQHgdcAi0rpQ7X4Gb5m1puVdMA3Iu4A7mijbi4wt0D5MuC4UtZbTcVf4evxHjPreXyFbxcVe22ar2Uzs57Eyd/MLIOc/LvI2/Fm1ps5+ZuZZZCTf+rCCfUdz0QXnuHbhb6YmVWak38XFf0M38p0w8ysS5z8izR4QHJ2bGf3AAb1T+bv47N9zKwHqdSN3XqdQwb169R8d142iYdefJM+nbipG8BPvngii555neEHDiile2ZmZeXkD/zwryewbeeeduc5efQhAIw+bBCjDxvU6bbrD96fvz1tdEn9MzMrNyd/4ISjDmL/fu1/FDfPPKWbemNmVnke808N6r8f9/2P0zhn/OHV7oqZWcU5+ec5ZthgrvnLD+9Vdmnjn3HIAZ07HmBm1lt42KcDl08+lssnH1vtbpiZlZW3/M3MMsjJHz+M3cyyx8nfzCyDnPzNzDLIyd/MLIOc/M3MMsjJ38wsg5z8zcwyyMnfzCyDMpn8r//ChGp3wcysqjKZ/CcfN5wfXdRQ7W6YmVVNJpM/wMfHDmOob9hmZhmV2eSfz09YNLOscfI3M8ugkpK/pAslrZTULKkhr3ykpG2SlqfT9Xl1EyQ9I2m1pKslb3ebmXW3Urf8nwXOB5YUqFsTER9Kp4vzyq8DZgJj0mlyiX0wM7MilZT8I2JVRDzf2fklHQ4MiYhHIyKAG4EppfTBzMyKV8kneY2S9DtgM/BPEfEQcCTQlDdPU1pWkKSZJHsJDBs2jFwuV5aOtbSzc9dOAB5Z+ghD+u87+lSu9VXD1q1be3X/u8Ix176sxQuVi7nD5C/pAWB4gao5EXFnG4utB46KiE2SJgD/KWkcFHxqSrS17oiYB8wDaGhoiMbGxo66u697796nqKWdfg/dDzt3cuqkUzl0UP99lunS+nqIXC7Xq/vfFY659mUtXqhczB0m/4g4s9hGI2IHsCN9/aSkNcAxJFv69Xmz1gPrim3fzMxKU5FTPSUdJqkufT2a5MDuSxGxHtgi6eT0LJ+LgLb2HszMrEJKPdXzM5KagFOAuyX9Kq06DVgh6WngNuDiiHgrrbsEuAFYDawBFpXSBzMzK15JB3wj4g7gjgLlC4AFbSyzDDiulPWWS5sHG8zMapyv8KXwUWgzs1rm5G9mlkFO/nj4x8yyJ9PJ38M9ZpZVmU7+3uI3s6zKdPJv4T0AM8saJ38zswxy8jczyyAnfzOzDMp08k8eKWBmlj2ZTv4t/CRJM8saJ38zswxy8sfDP2aWPZlO/h7uMbOsynTy9xa/mWVVppN/C+8BmFnWOPmbmWWQk7+ZWQY5+ZuZZZCTv5lZBmU6+ftcHzPLqkwn/xY+18fMssbJH+8BmFn2ZDr5e4vfzLIq08nfzCyrMp38PdxjZlmV6eTfwsM/ZpY1JSV/Sd+X9HtJKyTdIemgvLorJK2W9Lyks/LKJ0h6Jq27Wr6xjplZtyt1y/9+4LiIGA+8AFwBIGksMBUYB0wGrpVUly5zHTATGJNOk0vsg5mZFamk5B8R90XE7vTtY0B9+vo84OaI2BERLwOrgZMkHQ4MiYhHI7mf8o3AlFL6YGZmxduvjG3NAH6Rvj6S5MegRVNatit93bq8IEkzSfYSGDZsGLlcriwdbWln165dACxdupRB/fYdfSrX+qph69atvbr/XeGYa1/W4oXKxdxh8pf0ADC8QNWciLgznWcOsBu4qWWxAvNHO+UFRcQ8YB5AQ0NDNDY2dtTdfd179z5FLe3st/g+2LWLj3xkEgft32+fZbq0vh4il8v16v53hWOufVmLFyoXc4fJPyLObK9e0jTgHOBj8d6jsZqAEXmz1QPr0vL6AuVV5Qd6mVnWlHq2z2TgK8C5EfFuXtVCYKqk/pJGkRzYfTwi1gNbJJ2cnuVzEXBnKX0ohc8zMrOsKnXM/xqgP3B/esbmYxFxcUSslHQL8BzJcNBlEbEnXeYSYD4wEFiUTmZm1o1KSv4R8f526uYCcwuULwOOK2W9ZmZWmkxf4euxfjPLqkwn/xYe+zezrHHyNzPLICd/M7MMcvI3M8ugTCf/8BFfM8uoTCd/M7OsynTy96MEzCyrMp38zcyyysnfzCyDMp38fcDXzLIq08m/hfwIdzPLmHI+yatmPHT5GWzcsqPa3TAzqxgn/wJGHLI/Iw7Zv9rdMDOrGA/7mJllkJO/mVkGZTr5/+iiBs4aN4zBAzz6ZWbZkumsN3H0UCaOHlrtbpiZdbtMb/mbmWWVk7+ZWQY5+ZuZZZCTv5lZBjn5m5llkJO/mVkGOfmbmWWQk7+ZWQY5+ZuZZVBJyV/S9yX9XtIKSXdIOigtHylpm6Tl6XR93jITJD0jabWkq+UH6ZqZdbtSt/zvB46LiPHAC8AVeXVrIuJD6XRxXvl1wExgTDpNLrEPZmZWpJKSf0TcFxG707ePAfXtzS/pcGBIRDwayTMUbwSmlNIHMzMrXjnH/GcAi/Lej5L0O0mLJX00LTsSaMqbpyktMzOzbtThXT0lPQAML1A1JyLuTOeZA+wGbkrr1gNHRcQmSROA/5Q0Dgo+LLfNp6hLmkkyRMSwYcPI5XIddbdTytVOT7Z169ZMxJnPMde+rMULlYu5w+QfEWe2Vy9pGnAO8LF0KIeI2AHsSF8/KWkNcAzJln7+0FA9sK6ddc8D5gE0NDREY2NjR93d171371PUpXZ6mVwul4k48znm2pe1eKFyMZd6ts9k4CvAuRHxbl75YZLq0tejSQ7svhQR64Etkk5Oz/K5CLizlD6YmVnxSn2YyzVAf+D+9IzNx9Ize04DviFpN7AHuDgi3kqXuQSYDwwkOUawqHWjZmZWWSUl/4h4fxvlC4AFbdQtA44rZb1mZlYaX+FrZpZBTv5mZhnk5G9mlkFO/mZmGeTkb2aWQU7+ZmYZ5ORvZpZBTv5mZhlU6hW+ZmZt2rVrF01NTWzfvr0s7R144IGsWrWqLG31Fp2NecCAAdTX19O3b99Otevkb2YV09TUxODBgxk5ciTleGjfli1bGDx4cBl61nt0JuaIYNOmTTQ1NTFq1KhOtethHzOrmO3btzN06NCyJH5rmySGDh1a1B6Wk7+ZVZQTf/co9nN28jczyyAnfzOreU1NTZx33nmMGTOG0aNHM2vWLHbs2LHPfPPnz2fWrFnd1q+FCxfyne98p9vWl8/J38xqWkRw/vnnM2XKFF588UVefPFFtm3bxuWXX94t69+zZ0+bdeeeey6zZ8/uln605rN9zKxbXPnLlTy3bnNJbezZs4e6uro/vR97xBD++dPj2l3mwQcfZMCAAXzxi18EoK6ujh/84AccffTRzJ07l0GDBnW43p/97GdcffXV7Ny5k4kTJ3LttddSV1fHJZdcwhNPPMG2bdu44IILuPLKKwEYOXIkM2bM4L777mPWrFnMnj2badOm8ctf/pJdu3Zx6623cuyxxzJ//nyWLVvGNddcw/Tp0xkyZAjLli3j9ddf53vf+x4XXHABzc3NXHrppSxevJhRo0bR3NzMjBkzuOCCC0r4JL3lb2Y1buXKlUyYMGGvsiFDhjBy5EhWr17d4fKrVq3iF7/4BUuXLmX58uXU1dVx0003ATB37lyWLVvGihUrWLx4MStWrPjTcgMGDODhhx9m6tSpABx66KE89dRTXHLJJVx11VUF17V+/Xoefvhh7rrrrj/tESxcuJC1a9fyzDPPcMMNN/Doo4926XNozVv+ZtYtOtpC74yunOcfEQXPhImITi3/61//mieffJITTzwRgG3btvG+970PgFtuuYV58+axe/du1q9fz3PPPcf48eMB+NznPrdXO+effz4AEyZM4Pbbby+4rilTptCnTx/Gjh3Lhg0bAHj00Ue58MIL6dOnD8OHD+eMM87oVL874uRvZjVt3LhxLFiw91NlN2/ezIYNG1i6dCnTp08H4J577im4fEQwbdo0vv3tb+9V/vLLL3PVVVfxxBNPcPDBBzN9+vS9zrM/4IAD9pq/f//+QDLstHv37oLrapmnZb35/5abh33MrKZ97GMf49133+XGG28EkuMGX/7yl5k1axaXXXYZy5cvZ/ny5RxxxBFtLn/bbbexceNGAN566y1eeeUVNm/ezAEHHMCBBx7Ihg0bWLRoUUX6f8opp7BgwQKam5vZsGEDuVyuLO06+ZtZTZPEHXfcwW233caYMWMYOnQoffr0Yc6cOQXnnz9/PvX19X+ahgwZwre+9S0+8YlPMH78eD7+8Y+zfv16jj/+eE444QTGjRvHjBkzmDRpUkX6f95551FfX89xxx3Hl770JSZOnMiBBx5Ycruq1C5FuTU0NMSyZcuKXm7k7Lv3KVv7nU+Vo0s9Wi6Xo7Gxsdrd6FaOuedZtWoVH/zgB8vWXjnu7fPII4/w+c9/nttvv32fA8E90ZYtW5DEoEGD2LRpEyeddBJLly5l+PDh+8xb6POW9GRENLSe12P+ZpYpp556Kq+88kq1u1GUc845h7fffpudO3fy1a9+tWDiL5aTv5lZD1eucf58HvM3s4rqLUPLvV2xn7OTv5lVzIABA9i0aZN/ACqs5X7+AwYM6PQyHvYxs4qpr6+nqamJN954oyztbd++vagEVws6G3PLk7w6y8nfzCqmb9++nX6yVGfkcjlOOOGEsrXXG1Qq5pKGfSR9U9IKScsl3SfpiLy6KyStlvS8pLPyyidIeiatu1p+0oOZWbcrdcz/+xExPiI+BNwFfA1A0lhgKjAOmAxcK6nlVnzXATOBMek0ucQ+mJlZkUpK/hGRf3/WA4CWozrnATdHxI6IeBlYDZwk6XBgSEQ8GskRoBuBKaX0wczMilfymL+kucBFwDtAy+3mjgQey5utKS3blb5uXd5W2zNJ9hIAtkp6vovdPBR480/tfreLrfQue8WcEY659mUtXig95qMLFXaY/CU9ABS6nGxORNwZEXOAOZKuAGYB/wwUGsePdsoLioh5wLyO+tgRScsKXd5cyxxzNmQt5qzFC5WLucPkHxFndrKtnwN3kyT/JmBEXl09sC4try9QbmZm3ajUs33G5L09F/h9+nohMFVSf0mjSA7sPh4R64Etkk5Oz/K5CLizlD6YmVnxSh3z/46kDwDNwCvAxQARsVLSLcBzwG7gsohoeYrxJcB8YCCwKJ0qreSho17IMWdD1mLOWrxQoZh7zS2dzcysfHxvHzOzDHLyNzPLoJpO/pImp7eXWC1pdrX7UwpJP5a0UdKzeWWHSLpf0ovpvwfn1fX622tIGiHpN5JWSVop6e/T8pqNW9IASY9LejqN+cq0vGZjBpBUJ+l3ku5K39d6vGvTvi6XtCwt696YI6ImJ6AOWAOMBvoBTwNjq92vEuI5Dfgw8Gxe2feA2enr2cB309dj03j7A6PSz6EurXscOIXkmotFwCerHVs7MR8OfDh9PRh4IY2tZuNO+zcofd0X+C1wci3HnPb1H0hOF78rI9/ttcChrcq6NeZa3vI/CVgdES9FxE7gZpLbTvRKEbEEeKtV8XnAv6ev/533bpVRE7fXiIj1EfFU+noLsIrkivCajTsSW9O3fdMpqOGYJdUDnwJuyCuu2Xjb0a0x13LyPxJ4Le99u7eS6KWGRXLtBOm/70vL24r9SIq4vUZPImkkcALJlnBNx50OgSwHNgL3R0Stx/xvwOUkp4y3qOV4IflBv0/Sk+ltbKCbY67l+/kXdSuJGlOW22v0FJIGAQuA/x4Rm9sZ1qyJuCO5JuZDkg4C7pB0XDuz9+qYJZ0DbIyIJyU1dmaRAmW9Jt48kyJinaT3AfdL+n0781Yk5lre8m/rFhO1ZEO660f678a0vGZuryGpL0nivykibk+Laz5ugIh4G8iR3Pa8VmOeBJwraS3J0OyfS/oZtRsvABGxLv13I3AHyTB1t8Zcy8n/CWCMpFGS+pE8X2BhlftUbguBaenrabx3q4yauL1G2sf/C6yKiH/Nq6rZuCUdlm7xI2kgcCbJbVNqMuaIuCIi6iNiJMnf6IMR8QVqNF4ASQdIGtzyGvgE8CzdHXO1j3pXcgLOJjlDZA3JXUir3qcSYvkPYD3v3Rb7b4ChwK+BF9N/D8mbf04a9/PknQEANKRftDXANaRXeffECfgIyW7sCmB5Op1dy3ED44HfpTE/C3wtLa/ZmPP628h7Z/vUbLwkZyA+nU4rW3JTd8fs2zuYmWVQLQ/7mJlZG5z8zcwyyMnfzCyDnPzNzDLIyd/MLIOc/M3MMsjJ38wsg/4/yBUhVgAgvFIAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Plot rewards\n",
    "plot_rewards(\"Cliff World\",rewards, 'Q-Learning')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " >  >  >  ^  ^  >  >  >  v  v  v  v \n",
      " ^  >  >  ^  >  <  v  >  v  v  >  v \n",
      " >  >  >  >  >  >  >  >  >  >  >  v \n",
      " X  C  C  C  C  C  C  C  C  C  C  T \n"
     ]
    }
   ],
   "source": [
    "# print policy \n",
    "print_policy(env, agent)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q Learning for \"Taxi\" environment "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEICAYAAAC3Y/QeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAh20lEQVR4nO3dfZxV1X3v8c93BhhQQAUVgSEOxhELRlEmPiRRJ4pKjFG02mpuoiRNqDSa5iavGzXYRmtsE9PeNNZGL8aUmpoYi0GsxiRSOxpfagzEJxBBiBhH8AljYEQQ8Hf/2HvgMHPOPJ05c2bO/r5fr/1in7X2w2/NDL+zz9prr6OIwMzMsqWq3AGYmVnfc/I3M8sgJ38zswxy8jczyyAnfzOzDHLyNzPLICd/G3Ak3SfponLHUWkkzZf0jXLHYX3Dyd9KStJySS3pskPSlpzXX+vJMSPiYxHx7108f5OkkHREm/K70vLGnsRgXaPENyS9LOmP6e9jSoFtj8/522hdQtKf9nXcWeDkbyUVEVMiYnhEDAd+BVzS+joi/r6PwlgFXNj6QtJo4Fjg9T46f0GSBpXpvNV9dKrzgM8CxwOjgEeBH+bbMCJ+lfO3MRw4A2gBft5HsWaKk38GSBon6U5Jr0t6QdIXc+quknSHpFslbUqv1BvSusslLWhzrO9Kur4XYnq/pAckbZD0hqTbJO2dU/empKNy4n+j9So9vXr8XDdOdxvw5zkJ7wJgIfBuTjxVaXvXpDHdIWlUTv1/SnolvXp9KPfqNe0u+VdJ96Y/w19Len+BdtelV7N/Ien3wANp+WclrZD0B0m/kHRgWn61pH9J1wdLelvSdenrYeknqX26GOONkn4m6W3go5KOlPTbNOafAEO78TPtqonAwxHxu4jYAfwHMLmL+14ELIiIt0sQV+Y5+Vc4SVXAfwFPAeOBk4EvSTotZ7MzgduBvYG7gRvS8h8Dp0samR6rGvgz4Efp6+9JeqvA8nRnoQH/AIwD/gSYAFwFEBFrgMuA2yTtAfwbMD8imnr4Y1gHPAucmr6+ELi1zTZfBGYCJ6Yx/QH415z6+4B6YH/gtyRvKLkuAK4G9gFWA9d2EtOJJO0+TdJM4GvAOcB+JJ+Qfpxu9yDQmK5/EHgl3RfgOGBlRPyhizF+Mo1rBPA4cBfJVfgo4D+Bgt0rkj7Swe/6LUkfKbDr7cDBkg6RNJgkoXd6JZ/+3s8FutS9Zz0QEV4qeAGOAX7fpuwK4N/S9auAxTl1k4F3cl4/DFyYrp8CrCkilibgcwXqZgJPtCm7G3gGeBqo6cpxCp0T+BRJQp0ErErrmoHGdH0FcHLOfmOBbcCgPMfcGwhgr/T1fOD7OfWnA88ViKcu3fegnLL7gL/IeV0FbAYOBIYBW4DRwOUkbxLNwHCSN5vrC5wnX4y35tSfQPKmqJyyR4Bv9PLf3xDgu2ks24EXgIld2O/T6bbqzXi87Fp85V/5DgTG5V6lkSSQMTnbvJKzvhkYmtMX/SOSq1pIrhx/1BtBSdpf0u1KbgRuJOkO2LfNZjcDhwH/EhFbizzlT4GTgEvJ3+d8ILAw52e0AtgBjJFULembaZfQRmBtuk9uvG1/hsM7ieelNuf+bs653yT5ZDQ+It4BlpBc7Z9A8kngEeDDadmDkHwq60KMueccB7wcaaZNvdhJzD3xdZJPLBNIupWuBh5Ir+w7chHJm5VnniwRJ//K9xLwQkTsnbOMiIjTu7j/fwKNkmqBs8lJ/pJuUvvRGa3L8k6O+w8kV4OHR8RIkitz5Rx7OPDPwC3AVbn97z0REZtJrrDnkD/5vwR8rM3PaWhEvEzypncWMB3Yi+Tqndx4exJSm3P/ZZtzD4uIR9L6B0neuI4EfpO+Pg04Gngo3aYrMeaecz0wXlJu/fsKBav8I3Fyl+ML7HoE8JOIaI6I7RExn6RrrGC/v6QJJF1dbbvmrBc5+Ve+x4GNki5LbxBWSzpM0ge7snNEvE7SdfJvJG8iK3LqLo6c0RltlrzD+XKMIBnJ8Zak8cD/aVP/XWBpRHwOuBe4Kd9Bcm6g1nWhOV8DToyItXnqbgKuzbnRup+ks3Ji3QpsAPYAenuU0k3AFa03aCXtJem8nPoHSe5TPBsR77KrK+uF9PfTkxgfJemG+aKkQZLOIXkzySvajMTJs/yqwK6/Ac6TNEbJTfVPA4NJ7osU8mngkUju/ViJOPlXuEhGWHwCmErSh/oG8H2Sq8Ou+hHJFWWvdPmkrgaOAv5Iktx/2lqRJt0ZwMVp0ZeBoyT9rzzHmUDSXfFyZyeMiHUR8XCB6u+S3GP4paRNwGMk90sguQJtPcezaV2viYiFwLeA29Mum2XAx3I2eYSk77/1Kv9ZkvsAD+Vs060Y0zeRc4BZJDe3/5yc30Ev+hbJYIMngbeA/w38aUS8BTsf2Gv7vMeF+EZvycldajaQSboSeD0i/l+5YzEbSJz8zcwyqGzdPpJmSFopabWky8sVh5lZFpXlyj99WGgVybjxZpKbQhdExLN9HoyZWQaV68r/aGB1JI98v0vyFOBZnexjZma9pCyTSpFMM5D7wEkzu0ZW7CRpNjAbYNiwYdMmTJjQo5O99957VFW1f5/buDV4c2t573nsOUi8vb3nMRw4soqtO+CVt9/brVzsPqi7bmTS/i3bYWiB3/rajckxxg2vYl3Le7sdo3X/XC+3vMe296B2eBWDOrmMeHcHDKqCqi6OjN+6A4ZUgboxkr7199xRGweibe8lP7fqPD+LQn/blSpr7YXi27xq1ao3ImK/tuXl+i+S7790uwwYEfOAeQANDQ2xZMmSHp2sqamJxsZG3mjZyqg9hvDVO5/mj+9s45iJo/jGvSs6P0AJffwDY7n3mfU93n/F359OdZWou/ze3coHCVrfU2761DRmHHZAp8dqPcbTV53KB676JcNrBtGydTsAK7/58Xbbt3YZqjsZuoRaf89ZkrU2Z629UHybJeV9crtcb6HNJOOzW9WSzDNSMq9t3ELDNxbzncWrWLC0mfuffbWUpyuJ4+vbzn6Q/0r6T8aOpCZ9Wz+6blSXEn8+nd0PktRvEr+ZdU+5kv9vgHpJEyUNAc4necCmZF7blEwN898rXivlabot2n/gKahmUPsp2Nsm33mfnsZ/XfJhRgxOyr9x9mHdjin3mFWCL3w07+zEZjaAlaXbJyK2S7oE+AVQDfwgIjqbCyZT9hhSzeZ3d7Qp7fyN4pTJY5DEXx81lNUcQP3+nc0vVlgAv/uH9t09Zjbwle22WET8DPhZuc7fXxTqWfmbMyZzxU+f2a3sxEn7s7jAJ5dzjhzPT594eedV+9jhVVzQ2NXvzNidO3KsN23bto3m5ma2bNlS9LH22msvVqwo7326vtbVNg8dOpTa2loGDx7cpeNW0JiIgalQ8h9SnfTInXDIfjy06nX2HFLNp455H39z1zIAqqvEjvd27XzduYdz9VmdzaXWsS9Nr+epl96iKn0DGV7jPw8rXnNzMyNGjKCurq7oe0SbNm1ixIgRvRTZwNCVNkcEGzZsoLm5mYkTJ3bpuP7f3Yf+9ozJ/N09uz/HVqjP/6yp43jxzc2c/oEDeGjV6+y9x5Dd/uP8/K+P59cvvLnz9aDqKkZUF3cL50vTD9m5/vVPTOakQ/cv6nhmAFu2bOmVxG+FSWL06NG8/nrXv5Y608m/nMM8v3v+VKaM24tv/+K5vPWDqqv48imH8NKbm/PW148ZQf2Y0l0BfebDXbt6MOsKJ/7S6+7PONPJv5xG7TmEg/cfnrfbZ23OmPpxew9jxpQDuLgxGXFz1Scm7xx7b2bWU9l6VK4fUXpbtbPxO9VV4qZPT2PqhL0BmPXhiVxyUn1pgzOrMM3NzZx11lnU19dz0EEHcckll7B1a/tvBl27di2HHdb94dE9tW7dOs4999w+O18uJ/8y6c74fjPruYjgnHPOYebMmTz//PM8//zzvPPOO3z1q1/tk/Nv3174k/q4ceNYsGBBn8TRlrt9yuzAUbt/j/WJh7SbgsPMivDAAw8wdOhQPvOZzwBQXV3Nd77zHQ488ECuvfZahg/v/FmYpUuX8uUvf5mWlhb23Xdf5s+fz9ixY7n55puZN28e7777LgcffDA//OEP2WOPPZg1axajRo3iiSee4KijjmLDhg2MHDmSJUuW8Morr3Dddddx7rnnsnbtWs444wyWLVvG/Pnzufvuu9m8eTNr1qzh7LPP5rrrrgPglltu4Vvf+hbjxo2jvr6empoabrjhhqJ+Lk7+ZVI3ek8AvjrjUL7/8As7ywfnm73LrEJc/V/LeXbdxh7vv2PHDqqrd3/SffK4kXz9E4WHOS9fvpxp06btVjZy5Ejq6upYvXo1U6dO7fCc27Zt49JLL2XRokXst99+/OQnP2Hu3Ln84Ac/4JxzzuHzn/88AFdeeSW33HILl156KQCrVq1i8eLFVFdXM2vWLNavX8/DDz/Mc889x5lnnpm3u+fJJ5/kiSeeoKamhkmTJnHppZfyzjvvcM011/Db3/6WESNGcNJJJ3HEEUd05cfVocwl/3J1tozba+huryekV/xDOpsO08yKEhF5R8J09btMVq5cybJlyzjllFOA5A1o7NixACxbtowrr7ySt956i5aWFk477bSd+5133nm7vVHNnDmTqqoqJk+ezKuv5p9b7OSTT2avvZKv1548eTIvvvgiL730EieeeCKjRo3aedxVq1Z1KfaOZC75l4uHupnR4RV6V/TkIa8pU6Zw55137la2ceNGXn31VSZNmtTp/hHBlClTePTRR9vVzZo1i7vuuosjjjiC+fPn09TUtLNuzz333G3bmpqa3Y6ZT+421dXVbN++vctvUt2VucvOcqVg536z8jj55JPZvHkzt956K5BcuX/lK1/hkksuYdiwYZ3uP2nSJF5//fWdyX/btm0sX55MRbZp0ybGjh3Ltm3buO2220oS/7Rp03jwwQf5wx/+wPbt29u9kfVU5pJ/uTj5m5WHJBYuXMiCBQuor69n9OjRVFVVMXfu3Lzbr1y5ktra2p3LokWLWLBgAZdddhlHHHEEU6dO5ZFHHgHgmmuu4ZhjjuGUU07h0EMPLUn848aN42tf+xrHHHMM06dPZ/LkyTu7horhbp8+Ik+XZlY2EyZM4O67k1njH3nkES644AKWLl3a7kZwXV0d27Zty3uMhx56qF3ZnDlzmDNnTrvy+fPnd/i6paVl5/mWLUvm65o1axazZs3auc0999wDJJ8uPvnJTzJ79my2b9/O2Wefzamnnlq4sV3k5N9HfOVv1j986EMf4sUX8365Vb911VVXsXjxYrZs2cKpp57KzJkziz6mk38J/VlDLXcsaQbYOVOmmVl3/eM//mOvH9N9/iV07EGj+ddPHgXAudNqd5af3OFsmX6TsMpTqhErtkt3f8ZO/iX28cPH8vy1H+OvGnd9FeKENk/1mlWyoUOHsmHDBr8BlFDrfP5Dhw7tfOOUu31KqPVvfXCR8+ybDWS1tbU0Nzd3a675QrZs2dKtBFcJutrm1m/y6ion/xIYv/cwXn7rnXKHYdYvDB48uMvfLtWZpqYmjjzyyF451kBRqjb7krSXzfpQHR+s26fcYZiZdShzyb+UvY4H7bsnV505paiRPSccsm8vRmRmll9mun36ZKRlkedYeuV0Ru05pHdiMTPrQGaSf+vN1xXrez6dbGfa5v62nzLG75PMI1I3Ov9on9HDa/KWm5n1tswk/z5V4BPAqZPH8OPPH8sxE0f1bTxmZm04+feib5+3+xcstB3XLInj3j+6L0MyM8srczd8S+mo9yWjfDyJm5n1d07+JeTnGc2svypZ8pd0laSXJT2ZLqfn1F0habWklZJO6+g4ZmbW+0rd5/+diNhtOjpJk4HzgSnAOGCxpEMiYkeJYzEzs1Q5un3OAm6PiK0R8QKwGji6DHGUjGdvNrP+rtTJ/xJJT0v6gaTWOQ/GAy/lbNOcllUed/qbWT9VVLePpMXAAXmq5gI3AteQpMBrgH8CPkv+UfB506Sk2cBsgDFjxtDU1NSjOFtaWnhx6ZIe7dsdrfG9+spWAJ5b+RxNb6/p9v69oaWlpVePNxC4zZUva+2F0rW5qOQfEdO7sp2km4F70pfNwISc6lpgXYHjzwPmATQ0NERjY2OP4mxqaqKu/kh45OEe7d9VrfHd+/pT8HIzh046lMYPTuh4J4Cf37vb/r2hqampV483ELjNlS9r7YXStbmUo33G5rw8G1iWrt8NnC+pRtJEoB54vFRxlFO438fM+qlSjva5TtJUki6dtcBfAkTEckl3AM8C24Ev9MVIH9+ENTPbpWTJPyI+3UHdtcC1pTp3/nP25dnMzPo3P+FbAv6UYWb9nZN/CfnThpn1V07+JeCJ3cysv3PyLyFf+JtZf+Xkb2aWQU7+ZmYZ5ORfAp869kAATjxkvzJHYmaWn7/GsQQ+ULsXa7/58XKHYWZWkK/8zcwyyMnfzCyDMpP8/dStmdkumUn+Zma2S2aSv6daMDPbJTPJ38zMdnHy7wWnTB7DsqtPK3cYZmZd5uTfC67708MZXuNHJsxs4HDy76aD9x/ermyfPYeUIRIzs55z8jczyyAn/24KDxsyswrg5G9mlkFO/kW6c85x5Q7BzKzbnPy7SW3miZh24KgyRWJm1nOZSf69NbeP+/zNrBJkJvmbmdkuTv5mZhmUmeTv3hozs10yk/xL4a8a31/uEMzMesTJvwhfOXVSuUMwM+uRopK/pPMkLZf0nqSGNnVXSFotaaWk03LKp0l6Jq27Xm3HTg4g1VUDNnQzy7hir/yXAecAD+UWSpoMnA9MAWYA35NUnVbfCMwG6tNlRpExmJlZNxWV/CNiRUSszFN1FnB7RGyNiBeA1cDRksYCIyPi0UgGzN8KzCwmBjMz675STUI/Hngs53VzWrYtXW9bnpek2SSfEhgzZgxNTU09CqalpYUXly7p0b5tbd68eed6T+MppDeP19LS0uvx9Xduc+XLWnuhdG3uNPlLWgwckKdqbkQsKrRbnrLooDyviJgHzANoaGiIxsbGjoMtoKmpibr6I+GRh3u0f6499tgD3n4bgJ7G087P7+3d45G0uTePNxC4zZUva+2F0rW50+QfEdN7cNxmYELO61pgXVpem6d8QPnnP5/Kxi3byh2GmVmPlWqo593A+ZJqJE0kubH7eESsBzZJOjYd5XMhUOjTQ6/qzTFFM48cz4XH1fXa8Y6v37fXjmVm1hVF9flLOhv4F2A/4F5JT0bEaRGxXNIdwLPAduALEbEj3W0OMB8YBtyXLpn2/YsaeOfdHZ1vaGbWS4pK/hGxEFhYoO5a4No85UuAw4o5bzlNHrdXrx+zZlA1NYOqO9/QzKyXlGq0T0VacPFxTClB8jcz62uZSf69MbFbQ52/uMXMKoPn9jEzyyAnfzOzDHLyNzPLICd/M7MMcvI3M8sgJ38zswxy8jczy6DMJP+B+31hZma9LzPJ/9E1G8odgplZv5GZ5P/cK5vKHYKZWb+RmeS/YGlz5xuZmWVEZpK/mZnt4uRvZpZBTv5mZhnk5G9mlkFO/mZmGeTkb2aWQU7+ZmYZ5ORvZpZBTv5mZhnk5G9mlkFO/mZmGeTk30VzGt9f7hDMzHrNoHIHMBCs/ebHyx2CmVmv8pW/mVkGFZX8JZ0nabmk9yQ15JTXSXpH0pPpclNO3TRJz0haLel6yd+xZWbW14q98l8GnAM8lKduTURMTZeLc8pvBGYD9ekyo8gYzMysm4pK/hGxIiJWdnV7SWOBkRHxaEQEcCsws5gYzMys+0p5w3eipCeAjcCVEfErYDyQ+5VazWlZXpJmk3xKYMyYMTQ1NfUokJaWFqDnvUs9PW85tbS0DMi4i+E2V76stRdK1+ZOk7+kxcABearmRsSiArutB94XERskTQPukjSF/Bk4Cp07IuYB8wAaGhqisbGxs3DzSn5wb/doX4CenrecmpqaBmTcxXCbK1/W2gula3OnyT8ipnf3oBGxFdiari+VtAY4hORKvzZn01pgXXePb2ZmxSnJUE9J+0mqTtcPIrmx+7uIWA9sknRsOsrnQqDQpwczMyuRYod6ni2pGTgOuFfSL9KqE4CnJT0FLAAujog307o5wPeB1cAa4L5iYjAzs+4r6oZvRCwEFuYpvxO4s8A+S4DDijmvmZkVx0/4mpllkJO/mVkGOfmbmWWQk7+ZWQY5+ZuZZZCTv5lZBjn5m5llkJO/mVkGOfmbmWWQk7+ZWQY5+ZuZZZCTv5lZBjn5m5llkJO/mVkGOfmbmWVQxSf/la9s4nO/6Pn395qZVaKKT/4/+vWLbC/4FfFmZtlU8cnfzMzac/I3M8sgJ38zswxy8jczyyAnfzOzDHLyNzPLICf/Tlx/wZHlDsHMrNc5+XfizCPGlTsEM7NeV/HJX1K5QzAz63cqPvlH9Pzx3n2H1/RiJGZm/UdRyV/StyU9J+lpSQsl7Z1Td4Wk1ZJWSjotp3yapGfSuuvVTy/Nh9cM4mdf/Ei5wzAzK4lir/zvBw6LiMOBVcAVAJImA+cDU4AZwPckVaf73AjMBurTZUaRMZREQ90+7D9yaLnDMDMriaKSf0T8MiK2py8fA2rT9bOA2yNia0S8AKwGjpY0FhgZEY9G0h9zKzCzmBjMzKz7BvXisT4L/CRdH0/yZtCqOS3blq63Lc9L0mySTwmMGTOGpqambgfV/PLWbu8D8OaGN3t0vv6ipaVlQMffE25z5ctae6F0be40+UtaDByQp2puRCxKt5kLbAdua90tz/bRQXleETEPmAfQ0NAQjY2NnYXbzv/8cRn8/sVu7zdq9CgaG4/u9n79RVNTEz35eQ1kbnPly1p7oXRt7jT5R8T0juolXQScAZwcu4bWNAMTcjarBdal5bV5ys3MrA8VO9pnBnAZcGZEbM6puhs4X1KNpIkkN3Yfj4j1wCZJx6ajfC4EFhUTQ6n0yyFIZma9pNg+/xuAGuD+dMTmYxFxcUQsl3QH8CxJd9AXImJHus8cYD4wDLgvXUqmn44kNTMrq6KSf0Qc3EHdtcC1ecqXAIcVc14zMytOxT/h21P+2l8zq2QVn/yLmd7BzKxSVXzyNzOz9pz8C/BtYjOrZE7+ZmYZ5ORvZpZBFZ/8Pc7fzKy9ik/+ZmbWnpO/mVkGOfkX4O4iM6tkTv5mZhnk5G9mlkEVn/w9vYOZWXsVn/zNzKw9J/8CfLvXzCpZxSd/j9oxM2uv4pO/mZm15+RfgG8Tm1klc/I3M8sgJ38zswxy8i/At4nNrJI5+ZuZZZCTv5lZBjn5m5llkJO/mVkGOfmbmWWQk38BnhXCzCpZUclf0rclPSfpaUkLJe2dltdJekfSk+lyU84+0yQ9I2m1pOvlyXfMzPpcsVf+9wOHRcThwCrgipy6NRExNV0uzim/EZgN1KfLjCJjMDOzbioq+UfELyNie/ryMaC2o+0ljQVGRsSjkXzLyq3AzGJiMDOz7uvNPv/PAvflvJ4o6QlJD0o6Pi0bDzTnbNOclpmZWR8a1NkGkhYDB+SpmhsRi9Jt5gLbgdvSuvXA+yJig6RpwF2SppB/1oSCE2hKmk3SRcSYMWNoamrqLNx2mpu3dnsfgDfe2NCj8/UXLS0tAzr+nnCbK1/W2gula3OnyT8ipndUL+ki4Azg5LQrh4jYCmxN15dKWgMcQnKln9s1VAus6+Dc84B5AA0NDdHY2NhZuO00bVwOv1/b7f323XdfGhsbur1ff9HU1ERPfl4Dmdtc+bLWXihdm4sd7TMDuAw4MyI255TvJ6k6XT+I5Mbu7yJiPbBJ0rHpKJ8LgUXFxFA6ntHfzCpXp1f+nbgBqAHuT0dsPpaO7DkB+DtJ24EdwMUR8Wa6zxxgPjCM5B7BfW0PamZmpVVU8o+IgwuU3wncWaBuCXBYMec1M7Pi+AlfM7MMcvI3M8sgJ/+CPOuEmVUuJ38zswxy8jczyyAnfzOzDHLyNzPLICd/M7MMcvIvwF8xY2aVzMnfzCyDnPzNzDLIyd/MLIOc/M3MMsjJ38wsg5z8zcwyyMnfzCyDnPzNzDLIyd/MLIOc/M3MMsjJ38wsg5z8C/DUPmZWyZz8zcwyyMnfzCyDnPzNzDLIyd/MLIOc/M3MMsjJvwB/k5eZVTInfzOzDCoq+Uu6RtLTkp6U9EtJ43LqrpC0WtJKSafllE+T9Exad73ka2wzs75W7JX/tyPi8IiYCtwD/C2ApMnA+cAUYAbwPUnV6T43ArOB+nSZUWQMZmbWTUUl/4jYmPNyTyDS9bOA2yNia0S8AKwGjpY0FhgZEY9GRAC3AjOLiaFUIjrfxsxsoBpU7AEkXQtcCPwR+GhaPB54LGez5rRsW7retrzQsWeTfEoAaJG0sodh7gu80Z0dbgZuvqiHZ+sfut3mCuA2V76stReKb/OB+Qo7Tf6SFgMH5KmaGxGLImIuMFfSFcAlwNfJPzVOdFCeV0TMA+Z1FmNnJC2JiIZijzOQuM3ZkLU2Z629ULo2d5r8I2J6F4/1I+BekuTfDEzIqasF1qXltXnKzcysDxU72qc+5+WZwHPp+t3A+ZJqJE0kubH7eESsBzZJOjYd5XMhsKiYGMzMrPuK7fP/pqRJwHvAi8DFABGxXNIdwLPAduALEbEj3WcOMB8YBtyXLqVWdNfRAOQ2Z0PW2py19kKJ2qzwsBYzs8zxE75mZhnk5G9mlkEVnfwlzUinl1gt6fJyx1MMST+Q9JqkZTlloyTdL+n59N99cuoG/PQakiZI+h9JKyQtl/TXaXnFtlvSUEmPS3oqbfPVaXnFthlAUrWkJyTdk76u9PauTWN9UtKStKxv2xwRFbkA1cAa4CBgCPAUMLnccRXRnhOAo4BlOWXXAZen65cD30rXJ6ftrQEmpj+H6rTuceA4kmcu7gM+Vu62ddDmscBR6foIYFXatoptdxrf8HR9MPBr4NhKbnMa65dJhovfk5G/7bXAvm3K+rTNlXzlfzSwOiJ+FxHvAreTTDsxIEXEQ8CbbYrPAv49Xf93dk2VMeCn1wCIiPUR8dt0fROwguSJ8IptdyRa0peD0yWo4DZLqgU+Dnw/p7hi29uBPm1zJSf/8cBLOa87nEpigBoTybMTpP/un5YXavt4ujG9Rn8iqQ44kuRKuKLbnXaBPAm8BtwfEZXe5n8GvkoyZLxVJbcXkjf0X0pamk5jA33c5qLn9unHujWVRIXplek1+gtJw4E7gS9FxMYOujUrot2RPBMzVdLewEJJh3Ww+YBus6QzgNciYqmkxq7skqdswLQ3x4cjYp2k/YH7JT3XwbYlaXMlX/kXmmKikryafvQj/fe1tLxipteQNJgk8d8WET9Niyu+3QAR8RbQRDLteaW2+cPAmZLWknTNniTpP6jc9gIQEevSf18DFpJ0U/dpmys5+f8GqJc0UdIQku8XuLvMMfW2u4HWuUcvYtdUGRUxvUYa4y3Aioj4vzlVFdtuSfulV/xIGgZMJ5k2pSLbHBFXRERtRNSR/B99ICI+RYW2F0DSnpJGtK4DpwLL6Os2l/uudykX4HSSESJrSGYhLXtMRbTlx8B6dk2L/RfAaOC/gefTf0flbD83bfdKckYAAA3pH9oa4AbSp7z74wJ8hORj7NPAk+lyeiW3GzgceCJt8zLgb9Pyim1zTryN7BrtU7HtJRmB+FS6LG/NTX3dZk/vYGaWQZXc7WNmZgU4+ZuZZZCTv5lZBjn5m5llkJO/mVkGOfmbmWWQk7+ZWQb9fzGLpu5HoLSTAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# create taxi environment\n",
    "env = gym.make(\"Taxi-v3\")\n",
    "\n",
    "# create a Q Learning agent\n",
    "agent = QLearningAgent(alpha=0.25, epsilon=0.2, gamma=0.99, \n",
    "                       get_possible_actions=lambda s : range(env.nA))\n",
    "\n",
    "#train agent and get rewards for episodes\n",
    "rewards = train_agent(env, agent, episode_cnt = 5000)\n",
    "\n",
    "#plot reward graph\n",
    "plot_rewards(\"Taxi\", rewards, 'Q Learning')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion\n",
    "\n",
    "We see that Q Agent learns the optimal policy by 500 episodes of training in case of Cliff World. As the learnt policy does not have any exploration, the agent learns the shortest route of walking across the maze, right on the row just above cliff. As the actions are deterministic and there is no exploration, the agent has no chance of falling into cliff when it takes a RIGHT action in a cell next to the cliff. Therefore, it learns to take the shortest route towards goal. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
